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PROBLEM 


Develop statistical, physical, and computer techniques for interpreting, 
summarizing, and extrapolating oceanic and meteorologic data for reliable esti- 
mation of the sound velocity distribution in the ocean. Specifically, determine 
the length of time-series necessary to produce reliable long-time estimates of sea- 
surface temperatures; and, as a corollary, find whether or not systematic varia- 
tions of sea-surface temperatures, over periods of several years, are to be ex- 


pected. 


RESULTS 


1. Using autocorrelation and regression techniques, Six time-series of 


sea-surface temperature measurements were examined. 


2. Plots of the 100R? statistic (percent variance explained by regres- 
sion) as a function of time-series record length for the six time-series records 
considered lead to the conclusion that record lengths of 8 to 10 years are neces- 
sary to obtain reliable long-time estimates of sea-surface temperature. This 
conclusion is supported by the behavior of the autocorrelation coefficients for 
the 40-year Scripps Pier record. 


3. An examination of the annual average temperatures confirmed pre- 
viously published conclusions regarding the systematic year-to-year variability 
in sea-surface temperatures. In addition it showed that such long-term vari- 
ability is not unusual or unexpected. 
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INTRODUCTION 


This study is the fourth in a series of studies concerned with the analysis 
of sea-surface temperature observations. The first study dealt with the effect of 
missing data in long time-series of sea-surface temperature measurements on cer- 
tain regression and autocorrelation analyses. The second examined the use of re- 
gression models for time-space interpolation of sea-surface temperature observa- 
tions.? The third presented the results of an autocorrelation, regression, and trend 
analysis of time-series of sea-surface temperature measurements made at six lo- 
cations representing different oceanographic conditions and considered the diffi- 
culties encountered in applying these techniques to oceanographic data samples.? 

This study considers the oceanographic aspects of the last of the above 
studies.? In particular, it examines the length of time-series necessary to pro- 
duce reliable long-time estimates of sea-surface temperature. In addition, it con- 
siders the corollary question of whether or not systematic variations of sea-sur- 
face temperatures, over periods of several years, are to be expected. 


TIME-SERIES LENGTH 


The length of time-series necessary to produce reliable long-term esti - 
mates of sea-surface temperatures interests oceanographers concerned with 
observational programs for obtaining information necessary for establishing aver- 
age sea-surface temperatures. 

The time-series of data used to obtain insight into this question are 
listed in table 1. Van Vliet and Anderson? concluded from their autocorrelation, 
regression, and trend analyses of these time-series that the following regression 
model, with k = 2, provided a good statistical fit to the observed daily sea-sur- 
face temperatures: 


Ps Bye » a, sin [2ni(D-0;)/365]+ « (1A) 
— a 
or expanding, 
R 
T= By + S [Bais Sin (27iD/365) + Bri cos (27iD/365)] + € (1B) 
n= 


‘Superscript numbers denote references in the list at the end of this report. 


where D is time measured in days from some arbitrary origin, and T “is the 
fitted value of the surface temperature. Fitting equation (1B) to the ob- 
served surface temperature, T', using the method of least squares yields 
estimates of the regression coefficients, 8, and an estimate of the variance 
of ¢«. The amplitude a and phase @ can be obtained from the B’s. The quantity 

« is the random error of residual term. 


TABLE 1. LOCATION OF SEA-SURFACE TEMPERATURE TIME-SERIES 


Location Time Period 


Weather Ship PAPA 
BOON 145°W 
North Pacific 


1/56 - 8/62 
6 yr 7 mo 


Weather Ship ECHO 
35°N 48°W 
North Atlantic 


9/49 - 9/56 
7 yr 


1/35-1/61 
21 yr (5 yr 
missing) 


Cape St. James 
D2°N 131°W 
North Pacific 


Triple Island 1/40- 1/61 


54°N 131°W 21 yr 
North Pacific 

{Langara Island 1/41-1/61 
54°N 133°W 20 yr 


North Pacific 


1/21- 1/61 
40 yr 


Scripps Pier 
33°N 117°W 
North Pacific 


An integral part of any estimation problem is the determination of the 
reliability of the estimate as measured by the variability of observed data about 
the estimated values. As a measure of this variability consider the statistic R?, 
the fraction of variability explained by a statistical fit. Equation (1B) was fitted 
to the Scripps Pier data using samples within the 40 years of lengths 1, 5, 8, 10, 
20, and 40 years. This resulted in the following samples: forty 1-year, eight 5- 
year, five 8-year, four 10-year, two 20-year, and one 40-year. 

Figure 1 summarizes the R? statistic for records of various lengths for 
the Scripps Pier data. In general, a single year’s data are expected to yield a 
higher R? than would several years of data, where year-to-year variations would 


give a poorer fit, although in the forty single-year fits there are some years with 
poorer fits than those for longer periods. To compensate for the few years of 
poor fits there are many years of excellent fits. The fact that for 33 years R* was 
greater than 0.81 and for 23 years was greater than 0.86 substantiates this con- 
clusion. 

To compare R2’s for the various record lengths, L, the mean R”’s for the 
available runs of each length have been computed and 100R?’s have been plotted 
on figure 2. As expected the mean R? is a decreasing function of the length of 
record, L. More unexpected is the actual shape of the curve. From L = | the 
curve drops off sharply to somewhere between L = 5 and L = 10, from which point 
on there is a negligible decrease in R?. 

The mean R? is plotted in preference to the mean R or to the mean of 


: 1 1+R 
Fisher’s Z = —lo 
isher’s 5 ea R 


tionship of R to R? in the range of consideration of R? is so nearly linear that a 


, since R? is easiest to interpret. In addition, the rela- 


plot of mean R with appropriate scale changes cannot be distinguished from that 
of mean R?. The relationship of Z to R? is such that the curvature of figure 2 
would be even more emphasized if Fisher’s statistic were plotted. The distri- 
bution of R? is asymptotically normal for R 4 0.4 

Because of the dependence of the average R? among the samples from 
which they were computed, and because of the autocorrelated residuals, the 
development of confidence limits for these average R2’s seems intractable. How- 
ever, as a rough estimate of their variability, one standard deviation of the mean 
R? is plotted as a vertical bar in figure 2. These standard deviations are given 
by the formula: 


A(R) = aR WER? 
N 2 (365) 2 


They are computed under the assumption that repeated sampling over the same 

N = 40 year period at Scripps Pier is possible. In this conceptually possible but 
practically impossible situation, the deviations about the same mathematical 
model of regression are assumed to be independent among the repeated samples. 

Under these assumptions the confidence limits for the plotted points are 
narrow, and it is concluded that the sharp change in slope of the curve in the 
region 5 < L < 10 is real. 

Attention is called to the systematic change in the absolute magnitude of 
the average R2’s, for the data for PAPA, ECHO, Langara Island, Cape St. James, 
and Triple Island. This change appears to be associated with the exposure, or 
“‘continentality,’’ of the station — thus, PAPA and ECHO are typical of open- 
ocean locations; Triple Island is much like an open-ocean location, being a very 
small coastal island; and Scripps Pier is least like an open-ocean location with 
the observations being made at the end of a 1000-foot pier. Cape St. James and 
Langara Island have a ‘‘continentality’’ between Scripps Pier and Triple Island. 
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Figure 1. R? statistic for Scripps Pier data for records of varying length. 
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Figure 1. (Continued) 
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The implications of figure 2 are: 


1. A record of daily surface temperatures of 10-year length is adequate 
for fitting a regression curve to estimate long-term variability. 


2. The unexplained long-term variability, that is, variability unexplained 
by the regression model, varies from about 23 percent at Scripps Pier, for a 
sample longer than 10 years; to less than 5 percent at PAPA and ECHO, both 
one-year samples taken at exposed open-ocean locations. Since R? is not de- 
graded by extending a record beyond 10 years, the estimates of regression co- 
efficients based on 10 years are as adequate as those that might be obtained from 
a longer record. In the same light, records of 5 years or less reflect shorter-term 
variability in temperature and thus give an improved fit as record length decreases. 

Additional information on the length of time-series necessary for obtain- 
ing long-term estimates of sea-surface temperature may be obtained from an exam- 
ination of the autocorrelation function available from the 40 years of Scripps Pier 
record. 

For the various samples of Scripps Pier data, the autocorrelation func- 
tions were determined for the time-series consisting of the differences of the ob- 
served surface temperatures and the temperatures estimated by the fit of combined 
annual and semiannual terms, equation (1B) (k=2). The functions were computed 
for lags at intervals of 5 days up to 900 days in most cases. Consider first the 
autocorrelation functions for the eight different 5-year samples of data plotted in 
figure 3. There is considerable variability among the functions. It would be 
desirable to compute some measure of this variability, and compare it with the 
corresponding variability of the autocorrelation functions for the 8-, 10-, and 
20-year records of figure 4. 

Before computing any such measure, we have to make a decision as to 
the range of lags to use in the comparison. The value of the standard deviation 
of the nonsignificant autocorrelations, a, for the 40 years of Scripps Pier data is 
0.0293.? For a 10-year record o, = 0.0586, and the 95 percent significance values 
are +0.115. For reasons previously discussed, a 10-year record of sea-surface 
temperature is needed in order to obtain reliable estimates of the long-term vari- 
ability. Thus it is not necessary to consider 5- or 8-year records in selecting the 
range of lags to use. If we assume the 40-year autocorrelation function is close 
to the true function, lags out to 145 days yield autocorrelations greater than 0.115 
and can be used to compare the sets of functions. 

For the set of autocorrelation functions based on samples of length L 
years and for each lag ; = 5(5)145 days, we determine the following sum of squares: 


40/L 
Q(7,L) = [Cj() - C()]? (3) 
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Figure 3. Autocorrelation functions for 5-year records of Scripps Pier data. 


where Cj (x) is the autocorrelation coefficient for lag z of the j-th of 40/L sets, 
and C (7) is the mean of 40/L coefficients with the same lag 7. By analogy with 
normal distribution theory, the quantity Q (7,L)/o?(7,L) is like a chi-square vari- 
able with (40/L)-1 degrees of freedom, where o?(7,L) is the variance of C (7) for 
a sample of length L. Assume the Q(7,L)’s are independent, and that o7(7,L) is 
inversely proportional to sample length and the same for all 7. That is, 


o?(7,L) = ko?/L where k is any proportionality constant. Then kL Q (7,L)/o? 
Ta 


is like a chi-square variable with v = 29 [(40/L) — 1] degrees of freedom. For two 
different values of L, L, and L,, the ratio 


Ly », Q(7,Ly)/11 
if 


F ———— 
lbp », OCs» 


is like an F-variable with v,,., degrees of freedom. 


The ‘‘mean square,’’ L » Q(7,L)/v, and the F’-ratios using the mean 
(f 


square for 20 years as the denominator, are shown in table 2. Assuming a robust 
F-test, none of these ratios is significant (though less than 1) and it is concluded 
that the variability in the autocorrelation functions is about as expected. 


TABLE 2. VARIABILITY IN AUTOCORRELATION FUNCTIONS 


Mean Square 
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Figure 4. Autocorrelation functions 
for 8-, 10-, 20-, and 40-year records 
of Scripps Pier data. 
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Figure 4. 
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The investigation of the square of the multiple correlation coefficient 
suggested that about 10 years of record are sufficient for certain curve-fitting 
and estimation problems. It has just been concluded that there is no such break 
in the variability of the autocorrelation coefficient as a function of record length. 
Thus a decision as to the sample length necessary to obtain useful estimates of 
the autocorrelation function must be made on some absolute basis, or a cost func- 
tion must be introduced such that a combination of increasing cost and decreasing 
variability with sample size results in an optimization problem. 

Two additional comments are pertinent. First, figure 4 shows the auto- 
correlation functions for samples of 8, 10, and 20 years, respectively. Attention 
is called to the 10-year records. It appears that the autocorrelation functions 
agree well out to a lag of about 80 days. Second, autocorrelation functions for 
the same record lengths have been averaged by lags, and are shown in figure 5. 
There is a strong indication that autocorrelation functions from finite samples 
are biassed. Restricting the discussion to lags out to 80 days, and assuming 
the autocorrelation function for the 40-year sample is close to the true function, 
the mean functions for 5- and 8-year samples are badly biassed with little bias 
indicated in the 10- and 20-year mean functions. It is concluded that 10-year 
samples provide consistent and usable autocorrelation functions out to a lag of 
80 days. 


40 — YEARS 


1.0 


0.8 


IN31014434500 NOILYT4ydOOOLAY 


0.2 


30 40 50 60 70 80 90 100 
LAG (DAYS) 


20 


10 


Average autocorrelation functions.for various record lengths for Scripps Pier data. 


Figure 5. 
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SYSTEMATIC VARIATION OF SEA-SURFACE 
TEMPERATURE OVER PERIODS OF SEVERAL YEARS 


In connection with the question of whether or not sea-surface temperatures 
vary significantly over periods of several years, four specific time periods are ex- 
amined here in some detail: 


1. 1947 to 1956 which has been referred to as displaying *‘A uniform 
monotony of conditions in at least the eastern North Pacific that is scarcely sug- 
gested by any similar series of years in this century.’ 


2. 1936 to 1956 where there are indications of a cooling trend. 
3. 1957 and 1958 recognized as ‘‘the changing years.’’> 


4. 1930 to 1935, another period, unique to this century, which appears to 


contain long-term oscillations. 


In the previous discussion on trends by Van Vliet and Anderson,? a 
variety of statistical considerations led to the general conclusion that no trend 
existed in the records for any of the locations examined and that quantities such 
as the annual average temperature (8,), annual amplitude (¢,), annual phase (6,), 
and percent variance explained by regression (100R?) all behaved as independent 
random variables are expected to behave. In addition it was pointed out that this 
conclusion does not deny the existence of real year-to-year differences in the 
ocean, but rather emphasizes that these differences are not unexpected from the 
viewpoint of statistics and thus are not considered unusual or improbable events. 

First the period 1947 to 1956 is examined in the light of the statistical 
parameters developed in this analysis. Figure 6 contains a plot of the average 
annual temperatures (8,) for four eastern North Pacific locations. During this 
decade, for 8 of the 10 years at all four stations, the B,.’s were below the median 
value and for 2 years were at the median or slightly above suggesting that on the 
average the decade was cooler than normal. Figure 7 contains parameters that 
determine the shape of the seasonal variation (a,, a, 9,, 0,). A study of these 
four parameters does not suggest anything unusual about the shape of the seasonal 
surface temperature variation during this decade. Figure 8 contains the percent 
variance explained by regression and the standard deviation, both parameters 
concerned with the variability. Again a study of these factors does no suggest 
anything unusual in the amount or degree of variability. Thus, this analysis 
suggests that the surface temperature variation during this decade was unusual as 
compared with other decades observed in this century in that the average temper- 
ature was slightly below the median value. 

Although too much after-the-fact analysis of data is contrary to statisti- 
cal philosophy, sometimes it is of interest to do such analysis. Specifically, for 


the Scripps Pier data shown on figure 6 there is a suggestion of a cooling trend 
for the years 1939 through 1956. Applying the theory of runs to this period? the 
following sequence is obtained: 


AAAAA BB AA BBB A BB A BB 


This sequence has eight runs. The critical number of runs at the 5 percent prob- 
ability level for 18 observations is six, and it is concluded that no trend exists 
even in this selected period of time. The applicable median {> is 16.60°C. 

The years 1957 and 1958 have been recognized from a consideration of 
several natural science parameters as the changing years.’ These years appeared 
to conclude the 1947-to-1957 decade -- a decade of below-median sea-surface 
temperatures. On figure 6, the B,’s for the succeeding years, 1959 to 1962 in- 
clusive, have been included for Scripps Pier. In 1959 the annual average tempera- 
ture continued to increase for the third successive year followed by an abrupt de- 
crease in 1960 and lesser decreases in 1961 and 1962. This same pattern, though 
less marked, occurred for the three island locations. Thus, it appears that in 
1956 a long-term oscillatory variation, of period at least 6 years, began and that 
it was still in progress in 1962. 

In examining the Scripps Pier f’s in figure 6, we find that the period 1930 
to 1935 also appears to contain an oscillatory term with a period of several years. 
In figure 3 are plotted the eight correlograms, obtained after removing the annual 
and semiannual oscillatory terms, for 5-year periods at Scripps Pier. Although 
these correlograms are plotted out to lags of only 150 days, they were computed 
out to lags of 900 days. A study of the eight 900-day lag correlograms shows one 
to be somewhat different from the remaining seven — the correlogram for the 1931- 
to-1935 period. The correlogram for this period for lags out to 1400 days (fig. 9) 
shows a peak in the autocorrelation coefficient of 0.46 at a lag of about 1170 days, 
or a period of about 3.2 years. Since the points are scattered in the neighborhood 
of this peak as well as in the neighborhood of the minimum at lag about 750 days, 
another estimate of period length is given by twice the difference in lags between 
the up-crossing at lag 960 days and the down-crossing at lag 265 days, or about 
3.8 years. Either period confirms the intuitive conclusion reached from an exami- 
nation of figure 6. ; 

It thus appears reasonable to conclude that sea-surface temperatures will 
vary significantly over periods of several years and that these occurrences are not 
unexpected or improbable. 
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Figure 7. Annual and semiannual amplitudes and phases for all stations 
as a function of time. 
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Figure 8. Percent variance explained and standard deviation 
of residuals for annual records for all stations. 
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Figure 8. (Continued) 
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SUMMARY AND CONCLUSIONS 


Using autocorrelation and regression techniques, six time-series of sea- 
surface temperature measurements were examined to determine the length of time- 
series necessary for obtaining reliable estimates of sea-surface temperature and 
to determine whether or not systematic variations of annual average sea-surface 
temperatures over periods of several years is to be expected. 

Plots of the 100R? statistic (percent variance explained by regression) as 
a function-of time-series record length for the six time-series records considered 
lead to the conclusion that record lengths of 8 to 10 years are necessary to obtain 
reliable long-time estimates of sea-surface temperature. Additional support for 
this conclusion was obtained from an examination of the behavior of the auto- 
correlation coefficients for the 40-year Scripps Pier record. 

An examination of the annual average temperatures confirmed previously 
published conclusions regarding the systematic year-to-year variability in sea- 
surface temperatures. In addition it showed that such long-term variability is not 
unusual or unexpected. 
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